DQCOMEM^ BESOME 



ED . 137 349 TH 006 150 

ADTHOB Siracuse^ Kathlean 

TITLE ileasuring the Achi#Tament of Groups in Coffipansat^ry 

Iducation: In &iternati¥e Testing Prafflawork* 
PD3 DAIl £Apr 77 J 

HOTl 2Sp*i Paper presented at the Annual Meeting of thf 

Imarican Educational fieaearch lasociation (6lBt^ Hew 
XorX, Hew York, April 4^8^ 1977) 

IDHS PBICl MF-$0*83 HC-^$1,67 Plus Postage. 

DISCBIPTOES *lchie¥eaant lestsi ^Compensatory Education; 

^Criterion Haferenced Tests | Diagnostic lasts; *Group 
Teats; *Itein Banksi *It#ni Sampling | language 
Programs; Norm Bafarencad Tests; Norms; Beading 
Programs; School Districts; Secondary Education; 
Sacondary School Mathematics; Standardized Tests; 
Student Testing; Test Construction ; Testing Problems; 
Tasting Prograais; Test Intarpratation 

IBSTBACT 

An ac^ieFemant tasting framework is being deirelop§d 
by tha Los Angelas Unified School District to assess the educational 
progra£,s of 14#000 secondary level compensatory education students 
with something other than standardiMd tests* The technigue of 
multiple matrix sampling was applied to the use of large item domains 
in the subject area of reading, mathematics, anfl language 
development. The domains of items were built locally on "content 
maps" which describe the skills actually taught in the compensatory 
education program. The process of constructing such irameworks is 
transportable to other programs. The possibility of obtaining 
normative data from the frameTrork is being explored* (Author) 



* Documents acquired by EBIC include many informal unpublished ^ 

* materials not available from other sources* EBIC makes every effort * 

* to obtain the best copy available* Nevertheless^ items of marginal * 

* reproducibility are often encountered and this affects the guality * 

* oi the microfiche and hardcopy reproductions EBIC makes available * 
via the EBIC Document Beproduction Service (EDBS) * EDBS is not * 

♦responsible for the guality of the original document* Beproductiona * 
supplied by EDBS are the best that can be made from the original, * 



ON 



MEASURING THE ACHIEVEMENT OF 
Gmm m C0MP£NSAT0RY' EDUCATIONi 
AN ALTERNATIVE TESTING FRAMEWORr 



Author! Kathlaan Siracuse 

Los Angelas Unified School District 
Ra search and Evaluation Branch 
ESEAj Titla I Programs 



in 



us OfPABtMiNTOFHiALTH, 

NATlQNAHNSTlTyTE OP 
EDUC&TIQN 

TN.S DOCUMENT 

T^i^mn FX^^CTLY AS DECEIVED FROM 

IrfNG lt POINTS OF VIE^ OB OPINIQNI 

Tent QFF^.AL NM.ONAL iNlT-TgTE OF 
EDUeST'ON POSITION OR POLICY 



^Praparad for presantatlon at the annual maatlng of Arnarlcan Educational Research 
Association, April 4-8, 1977, New York City 



Interest In critterion-referenced Msting (CRT) has accelerated in recent years, 
particularly as individualized approaches to learning have become more prevalent. 
This Interest has lessenad the emphasis placed on programs based only on broad ' 
educational goals* Goncapts have now bean added to critarlpn-referenced testing 
that suggest far-reaching, highly adaptable uses for groyp as well as individual 
assesamant* These concapts as described by Millman (1972) Include the establish- 
ment of large domains of items which collectively represent a proficiency standard 
in a subject areai and the construction of a matrix based on the total domain of 
items (or item pool). The matrix, when ysad horizontally ^ provides a set of either 
homogenous or heterogeneous items (in difficulty and format) that test a single 
objective* When used vertically, the matrisc may be used to establish subtests of 
Items which sample complete cross-sections of items across the entire domain of 
objectives for a subjact area. 

By making use of the subtests as detailed by Shoemaker (1974), and administering 
them In equal number to approxlniataly equal portions of a student population at 
random, group scores are yielded which provide group diagnostic Information on 
examinees* mastery of the objeetives In a domain. As in the case of individual 
scores obtained by traditional uses of CRT»Sj the group is compared to a standard 
of achievement set by the criteria of the domain of objectives* The group Is not 
compared to other groups on broad educational goals as in nom^referenced tests* 
(See also Shoemaker, 1975,) 

A New Framework for Achievement Testing 

By combining the concepts of criterlon^-referenced tests and multiple matrix 

sampling as described above » It has bean possible to establish criteria for com*- 

patency In the reading, mathematics and language develo^ent programs for the 

iecondary level of compensatoryeducation in Los Angeles, California. These 

criteria have served to define the domains of program objectives. Items were 
■ ; , ; 3 ^ 

... ' ... ' ; ■ / v.- :■ ■ ; ;. ' i ., ' v. ■ . ■ , ■ 



purGhased or ganerated that representad competancy in meetiiig the objectives. 
The TieKt step was to establish a matrix for each subject area as dascribtd, 
construct subtests and administar an experiemental testing framework that is 
being analysed and revised to match the instructional program* The testing 
framework and the program, are intanded to become one and the same. The results 
will ultimataly provide information on how wall the actual objectives of the 
programs are being met, aiiid suggest priorities for Instructional decisions about 
the directions the overall programs should be taking. 

The value of such results is that they offer group performance information, re- 
quired for reporting to funding agencies of specially- funded programs. In addi- 
tion to this, such group diagnostic scores serve to re-educate the community, the 
press, the parents, the educators and the source of funding about the uses of 
test scores. Since norm-referenced tests do not offer group diagnostic informa- 
tion^ but only a compariion of groups to one another across diverse populationi 
and on g6n#raliised skills, they are not useful in making competency-based educa- 
tional decisions for priorities in program content. The much more specific in- 
fomation provided by group assessment with doraain-referenced tests using multiple 
matrix sampling is far mora enlightening, since it specifies what the group can 
and cannot do as the result of the objectives implemented in a program. 

Norma tivii Data Ffom the Framework? 

A further possible use of criterion-referenced tests for group assessments exists 
in the normlng of the test results* As pointed out by Roudbush (1974), CTB /McGraw- 
Hill has already conducted research to determine the relationship between norm- 
referenced and criterion-referenced tests. Ihelr inltiaL f indings vmrm that ^11- 
witten, comprehensive, criterion-referenced tests may be able to produce norm-' 
referenced teat results about as well as norm-raferenced tests, l^ls relationship 



will be -^investigated in the present projaet as a componanc of interpreting re* 
suits obtained from administering the framework, 

- Over a" three-year period, framework test results will be compared to seores 
achieved by the same population on appropriate levels of the GTB/McGraw*Hill 
Comprehensive Tests of Basic Skills. Should the framework pr^ve by this compari- 
son to produce normative data, the way would then be open to eltaisiate redundant 
testing programs by using the framework for both criterion-referenced and norma- 
tive data* This would help to meet the need for both types of evaluation! that 
exists in specially* funded educational programs. 

Uses of the Framework for ^ndiyidual Diagnostic Purgpse_s 

The establishment of the matrix based on a large item pool also offers an oppor- 
tunity for individual diagnostic measurements* As mentioned above p the matrix, 
when sliced horizontally contains many items covering a single objective in a 
subject area. Sets of diagnostic tests on single objectives or on small numbers 
of objectives may then be constructed. This Increases the flexibility of the item 
pool or domain and provides educators with even inore detailed asaesOTents of 
student performances based on the same domain of objectives* 

Issues of Test Security 

By maintaining consistency between the program objective.i and the evaluation 
criteria in all three foms of evaluation mentioned abovei the need for traditional 
test security is eliminated. The program content and the test content become 
identical tn what they require a student to demonstrate. Then, because the item 
pool is large (several hundred items) and because there are multiple foxms of 

.. .. . • . ■ . . . !■ 

the test, it is no longer possible for the test to be memorized, A student may 
get any one of the various forms of the subtests during an examination. 



Instructional Applications for the Framework 

Teachers can use the testing framework's domain of objectives to plan prograin 
content* Results from pretests will show a class's and a grade level's perfor- 
mance on all the objectives, Th% idea will then be actually to teach to the 
framework's item format and difficulty, since It has baen the program's objectives 
which detemined the item pool content in the first place* 

Training Teiichers to Use the Framework 

It will be necessary to conduct training sessions for teachers in the use of data 
from these domain- referenced achlavement tests, ^is need for training was 
described by Shoemaker (1974) as being an essential component of converting, to 
such a testing program* He states the followingi "The critical Ingredient here 
is creating a domain-referenced achievement testing atoosphere within the class* 
room and reorienting the teacheri and students so that 'teaching specifically to 
the test'=or item domain j is perfectly acceptable and the primary goal of in* 
struction*" (p, 157) Test results from the framework are Intended to provide 
group Information specific enough about Instructional needs that a teacher would 
ba able to focus directly on skills and concepts in need of strengthening. 

Flexibility of the Item Pools 

An additional advantage of building a testing framework on an Item pool or domain 
is that It allows an entire district (perhaps even a fetate) to unify itself around 
a pool of items whieh form a composite of their various programs- Individual 
schools (or districts) may thfen select those areas of the domain for which they 
wish to be held responsible. In other words, they may identify the portions of 
the domain they are actually trying to reach. When tests are atolnlstered across 
thie entire domain^ the results show how well their students are doing on areas 
that are not being fonnally taught in their partlcnlar program as well- 

:.. ■ : ■ , , . .. .6 - ..■ ■ ■ r ■ .v; 

' .'...■.--^ ' ' ". .. . ■.■ ■ 4. ..■ ' ' ■ ■.. .-. ...... .... 



The ground has already been broken In this district for such assessment to take 
place on a broader scale. The Los Angeles Unified School District (LAUSD) is a 
large district (732,000 students) with a composite of specially-funded programs 
serving 142,248 students as of school year 1975-76, Approximately 14,000 of 
these students are involved in the secondary public schools portion of the pro- 
grams. The state evaluation office for secondary ESEA Title I funded programs 
has granted LAUSD ^rmlsslon to compose its omi version of a testing framework 
for language developinent. ^e framework is currently being built and Is based on 
a language development currlculinn written by educators in the secondary cempensa* 
tory education program especially for the needs of the students involved. Again, 
the domain of objectlvas la the same for the program and the testing framework. 

The acceptance of this type of data at the state level for specially-funded pro- 
grams is a valuable preaedent to have set. It is hoped that its use by the state 
to dstermlne program effectiveness will serve to demonstrate the much more compre- 
hensive nature of criterion-referenced test data for assessment of group perfor* 
mance as compared to the limited infomation obtained from norm-^refereneed 
standardized tests. 

It is further hoped that the success of the framework would create an opening for 
the possibility of state-wide item pools and testing frameworks in subjects taught 
as components of specially-funded programs. 

Applications of the Frainework 

The final product of this project is expected to be a testing framework for assess- 
ment of group performance in the subject areas of readings mathematics and language 
development at the secondary level in compensatory education programs in Los 
Angeles* It is fully expected that the framework will be transportable to other 
districts conducting programs of a similar nature and with similar populations. 

. .. ... . ; .:,7. . 

o . .... 

ERIC 



The actual process by which the framework Is built can also be made transportable. 
A model for the construction of testing ftameworks for group performance in any 
subjedt area and for any grade level will be developedp By using the models dis- 
tricts anywhere in the country will be able to establish testing frameworks to 
match their prograsns. The actual format of the modal has not yet been determined, 
but it may consist of such components asi 

1) a written description of the steps to follow, methods to usSi and types 
of personnel to involve, possible expensesj time tables and evaluation 
designs to use, 

2) audio-visual aids to illustrate the above* 

3) a list of available consultant services, 

4) an annotated bibliography of resources in profeisiorial literature that 
relate to the construction and use of such frameworks. 

Methods for the norming of compifehenslve erlterion-referenced test results will ' 
also be Included in the process modal if the attempt to carry this out becomes a 
successful component of the project. 



8 



Procedures for Con§tructing the Testing Framework 

The process of dasignlng and constructing the testing fraineworks for assessment 
of grdup performance on Item domains is already well under way for the Los Angale 
secondary compensatory education programs. 

Slightly different procedures were used for Gonstructing each of the frameworks 
for reading, mathematics j and language artsj of fering an opportunity to cprapare 
differences in approaches to some of the tasks involved* Basle tasks in each are: 

1. Construction of Content Maps/Generation of Test Items 

2* Scoring Procedures and Statistical Wfethods of Evaluating the Tests 

3* Tryout and Use of the Tests 

4, Ravision/Hef inement of Tests 

Construction of Content Maps and jSeneration of Test Items 

Alternative Approaches to Defining the Test Domain — The domains of 
objectives for the three frameworks were defined with three different 
approaches, although similar tyj^s of personnel were used in each case. 
The approaches, personnel, and working titles for the frameworks are 
described below. 

Framework for Assessment in toadin^ (FAIR) s 

Series of workshops were held in November and December 1975 in which 
coordinators of reading programs at the school level and a reading 
teacher from each of three inner*city junior high schools were* 
presents Also included were three reading^content advisors from 
— - ainlnistratlve offices in the distrlctj a distrlct-hired content 

expert from Southwest Regional Laboratories , and an evalu^tor from 

■ ■ ^ ' : '.; . -9 . .■ 



the research and evaluation branch for specially- funded programs 
in Los Angeles, 

The objectives in the domain for this frameMork (or content map) ware 
selected from a scope and sequence of the objectives for reading that 
exists as part of a reading management program used in the district. 
This program, entitled Developraental Reading Program (DRP), was 
written by the district and published through Paul Anldon and Asso- 
clafts in Minneapolis, Minnesota (Copyright 1972). 

The portions of the sco^ and sequence that rapresented reading pro- 
gram content In the three junior high schools were selected to define 
what is meant by reading in' the secondary compensatory education pro- 
grams In this city. A total of 46 objectives were adopted covering a 
fairly wide range of skills. It was felt this would be necessary to 
acconmodate the wide range of below-grade achievement leTOls in the 
programs while providing enough celling in the domain to discover what 
students already know about what may not have been taught formally. 

Framewo rk for Assessment i^ Mathematics (FAIM) ; 

The workshops held to determine the domain of objectives in mathematics 
Involved personnel in the same categories of positions as listed under 
FAIR (see above). The only difference was that all positions dealt with 
mathematics programs only. These workshops also took place In November 
and December 1975. 



con- 



What did vary was the means by which the domain of objectives (or 
tent map) was defined. In the FAIM workshop, its members generated the 
objectives for mathematics based on their experiences with what is 
acCusUy taught and their knowledge of math content^ They did not work 



from a seoj^ and saquence of objectives specified by a pre-existing 
packaged management system. The mathematics domain was defined to 
represent the range of skills and knowledges dealt with in the city's 
secondary compansatory education mathematics programs. As in FAIR5 a 
certain amount of extra calling was added to the domain to detect 
serendipitous learnings. Sufficient floor was allowed as in FAIR to 
accommodate the range of below*grade achievement levels of students 
In these programs* 

Framework for Assessment in English Skills (FAXES) g 
The domain of objectives (or content map) for language development 
skills and concepts was defined independently of the workshops for 
FAIR and FAIM. These workshops were held during stjraner 1975 for the ■ 
purpose of defining and generating a curriculum management system for 
this subject area* The entire package was designed to meet the In- 
structional needs of secondary corapensatory education students In 
language development, 

ESEA Title I personnel CEngllsh teachers, school program coordinators ^ 
and content advisors in language development) worked together to 
define and write a domain of objectives that would describe the 
language development program for the targeted student population. 
The group has subsequently produced the curriculum package in the 
form of a management system* The coordinators of the language devel* 
optient programs have been trained in the use of this system, and began 
implementing it in their school programs in the fall of this year as a 
field test of the materials, 

Ihe process of working from the domain of objectives for this system 

It- 



to produce a matrix of test items organized into subtests has 
already been carried out. The selection of the iteme Is discussed 
in the next subsection. 

Itam ^nerating Procedures — Teat items for the three testing frameworks 
were acquired in three basic ways* l^ey were purchased from item banks, 
used with copyright releases from publishers or generated by workshop 
members and by currlculinn develo^rs as in the case of Items for FAIES. In 
all cases items were selected or revised to have four ans^^r choices* 

Framework for Assessment In Reading (FAIR) i 

Test items were selected from two sources of previously existing collec 
tions of itamsp These were the pre- and posttests for the Developmenta 
Reading Program (DRP) mentioned earlier * and the National Assessment of 
Educational Progress released 'exercises published through the Superin- 
tendent of Documents, U* S* Government Printing Office 5 Washington, D*C 
July 1973, Items contained In the DRP were generated by that program's 
developers. 

The Itemi from DRP i^re already coded to the objectives as part of the 
management system of that program* It was a sjinple matter, then, to 
locate and select test items to assess performance on objectives. In 
some cases, however , Items were revised for greater relevancy to sec- 
/ ondary level students* (The portion of the DRP used was originally 
' witten as a program for elementary students.) 

items taken from the National Assessment of Educational Progress 
materials were used only in the first experimental edition of FAIR* 
Since these items have been normed, they ^re included only for the 
purpose of comparison with similar Items in the tests that came from 
DRP. They are not included in subsequent editions. 



Where ttere were, insufficient nimbers of items available In the DRP 
materials or where existing items were inappropriately formatad, work- 
shop members ganerated test items based on the specifiGatlons of the 
objective and by drawing upon their knowledge of the content of reading 

Framework for Assessment in Mathematics (FAIM) i 

CO-OP items in mathematics rare purchased from the University of Mass- 
achusetts. This item pool consisted of test items generated by public 
school and university personnel to cover the mathematics domain typical 
of **average'* students in grades 4 « 9* 

These items had full copyright releases on them and covered a wide 
range of skills and knowledges with large pools of items. Since the 
items were labeled to indicate the skill represented by them, workshop 
members were able to select those that matched the objectives in the 
domain for FAIMe It was discovered, howeverj that several of the items 
picked were improperly written. Corrections of errors were made by 
workshop members before the printing of the first experimental edition* 
in some cases j this correction necessitated the witing of a completely 
new item. The new items to re modeled after the intent of the original 
items and formated similarly if approprlatet 

Additional- items were purchased from Instructional^ Objectives ^change 

.... k - - . .... ... . . . : ' . . . . . ' . . 

or lOX at UCM. These. also ha^ full copyright releases . Not many of 
these items were used since few of them matched the domain of objectives 
developed for FAIM. - - 

Framework for Assessment in English Skills (FAIES) ; 

In the siflnmer workshop for FAIES^ curriculum management system materials 
developed by the workshop members for language developcnent Instruction 



were available. Thaie currlculuni develo^rs used the objectives they 
had written and tJiair extensive knowledga of the content of the subjeet 
matter to select sanplea of items from the curricylim materials* These 
samples served as models for themselves and other writers to generate 
items for the objectives with some consistency. After the itemf were 
witten, fellow workshop members critiqued the ability of the items to 
actually test the skill deacribed in the objective. 

Of the three methods used for selection of items for the framework, the one 
used for FAIR has proved most successfult The fact that the items v^re part 
of or based on a proven management system already field tested and in use in 
the district seems to have been beneficial. The use of the FAIR in field 
testing has produced the least amount of criticism of Items for their, 
appropriatetiess to the age and ability levels of students# 

Still to come for all three portions of the framework , are workshops in 
which teachers and various content experts will critique test items for 
validity and for racial, ethnic and sexual bias* 

Each set of tests will also be subjected to a critique by students for the 
purpose of gaining ideas on content for test items that would be of Interest 
to the age level of the students taking them* 

Critiques on the items in FAIM to date indicate that teachers feel many of 
the Items are too difficult for students In the program* This raises an ^ _ 
interesting IssueT" since it was teachers and coordinators from the same 
program who selected the items for FAIM, ffie conclusion need not be that 
this method of item selection is invalid , but staply that pitfalls are in» 
volved- ^ose selecting items may have a tendency to dverestlmate student 
achievement levels and provide items beyond the capability of the students. 



Thoie administering the tests in reacting to the phenomenon of accounta- 
bility, may have a tendency to underesttoate student achievement^ and 
suggest the eltalnation of Items that may iii fact be within the reach of 
a program's population* 

The thing to be aware of here is that both tendencies occur, and can be 
reduced at least by cautioning those selecting the items about over- 
estimating student abilities. Data obtained from field testing provides . 
the information necessary for adjusting the difficulty of Items during 1 
the revision process* A range of difficulty can then be provided to 
challenge but not overwhelm students. /Hie idea Is to have enough celling 
on the framework to^ measure growth in the achievement of the groups but 
also to provide enough bottom and middle range to comprehensively diagnose 
the groupla performance on the actual heart of the program. 

Thm method of item selection for FAIES has proved unsuccessful , but not 
because the actual method used la Inherently bad. The selection of items 
from an unfield^tested collection of materials simply prcduced a set of 
tests that did not closely enough define the parameters of the language 
developnent programs being tested. Also, since the Items used were 
actually drawn from much shorter„ pretests for a management package, their 
fomat did not lend itself to longer tests. As a result, it is necessary 
to conduct workshops involving various content exerts (teachers, coordi- 
nators of reading progrMis, and district curriculwa consul tants) during 
which the content map (or domain of skills and concepts) for tht language 
development component for compensatory educatidri at the secondary level 
will be defined* 6nc% these are defined i the existing items will have to 
be re-^itten or a^ new source of items located. 

One of the crucial steps left out of the item selection process for FAIES 

13 



was the Involvement of content experts who aotually deal with the program 
students in deciding which skills and concepts are. definitive of the 
program's objectives* 



Test Constr uction Procedures — In the ease of all three frmneworks , sub- 
tests were constructed in the same manner. The items in the respective 
domains were arranged in a matrix as show below. Itams for a single objec- 
tive were arranged in the row for that obJecti%m on the matrix. The rows of 
objectives represent the subdomains of single skills or concepts for the 
framework and for the pro gram to be evaluated. 

Items testing a single objactlw are arranged randomly in a row. The sub- 
tests are built by cutting the matrix vertically and using all the Items in 
a colinm* 



Objectives % 
p — 

1 

2 
3 
4 
5 
6 



7 

50 



-SAMPLE MATRIX^ 
Subtests 
5 6 7 



10 



Items in 



— 




subdomaln 






















- 1 - 10 

- 11 - 20 

- 21 - 30 
-31- 40 : 
-41 - 50 

- 51 - 60 

- 61 - 70 ■ 

etc. through 
item 500 " 




























































4- 



















































































er!c" 



Figure 1 

*For a hypothetical domain of 50 obJectlTOS having 500 items in the item pool. 



IndttMinR Test Items to Cufrlculum for Diagnostic Purposes — In all three 
cases of the testing frameworks , the object^ives for the test domain wre 
given code numbers. Since the domains of objectives are descriptive of the 
respective programs involved, all test items are also coded to indicate the 
objectlTC for which they are written* Then t by referring to these code 
niEnbers when reporting teat results s it will be possible to report diagnos- 
tic infomation to teachers on studenti * abilities to perform on the objec- 
tives in- the entire domain* (This relates to students ' abilities to complete 
entire subtests. See the discussion of test length below* ) 

Scoring Procedures and Statistical tethods of Evaluating the Tests 

Methods of Estiinatlnf^ Criterion Scores — As mentioned earlier, one of the 
priroary goals of taplementlng the^ test frameworks is to be able to deter- ^ 
mine level of achievement of the secondary compensatory education students 
in Lqs Angeles over the three Item domains for reading, mathematics and lan- 
guage development - 

To accomplish this 5 the technique of multiple matrix sampling described 
earlier is being and will be used to administer both experimental and final 
editions of the tests* ^e subtests of Items from the three domains are 
intended to be administered to randomly selected subgroups of students in 
grades 7 - 12* Although different subgroups take different subtests, it is 
pointed out by Shoemaker (1974) that the parameters estimated will be those 
that would have been obtained if all students had been tested on every item 
in the domain* It Is, then, the ability of multiple matrix smpllng to 
estinate achievement on large domains of items that makes it so valuable In / 
assessing group achievenient i since traditional norm-referenced tests only = 
give this estimation on a very limited ninnber of items* A further advantage 

••;';'-..-^:n:-, / i7^':;:.::v;;:: /V"''-.:^^^ ■ . 




is that although a testing doraalifi may con^iiit of 500 items, each student 
takes a subtast of only 50 items* Whan all the subtest' scores are eombined, 
a coripoaite score is obtainad ol each objectivs across the multiples of 
comparable items on all subtests. 

As expiained by Shoemaker (1974), "the results obtained from each subtest 
are used to estimate all parameters of interest*" (p. 178) This means that 
on a single subdomain of items on an objeGtive across all subtests^ the 
results obtained on the same item type in each subtest are averaged or 
pooled to produce the single best estimate of that skill or concept. 
Standard errors of esttaate are computed for each esttoated skill or concept 
by using data obtained from all subtests- 

The pooled estimate of a skill or concept is used to estimate the distri^* 
bution of scores on each subdomain, - — 

Establishing Cutoffs — The process of deteminJ/ng which areas of the domain 
should be completed with competency by any grade level in the programs is 
something that will evolve out of ^ the interaction of teachers with test re^ 
suits. It was mentioned earlier that an advantage of working from a domain, 
of objectives is that personnel operating a prograWlcair^seTect^^^^t arenas 
of the. domain for whlc^ it is reasonablf tq be held responsible^ _ 

It will be necessary for teachers and administrators to have eKp^rience with 
the three frameworks and the type of achievement data they provide before 
decisions can be made at the school level about criteria for levels of compe- 
tency across the domain of objectives. Such competency criteria might demon- 
strate readiness for students to exit from compensatory education programs* 

Normative data- on grade equivalents for achievement within a subdomain or 
across the entire domain of objectives may. help to establish such cutoffs* 



^Is, of course, can only be done if it proves possible to obtain nomat 
data on the testing frame*rDrk, (See above discussion "Nomative Data from 
the Fjfaraework?") 

Detarmining test Length The first ex^rimental editions of FAIR and FAIM 
^re adininlste:ced in the spring of 1976* At that tijne each of the subtests 
for FAIR were 51 items long. When test results were analyzed after this 
first field testing of the frameworks ^ it was found that significant numbers 
of students were unable to complete the latter portions of the subtests. 
Since FAIR and FAIM are doinain-rafarenced testss it is necessary that students 
have sufficient time to attempt-all items on an entire subtest. As a result ^ 
the second eKj^rimental editions being administered In simmer 1976 have been 
revised to reduce their length as follows i 

Framework for_Assessment in Reading (FAIR) i 

The items from the National Assessment of Educational Progress materials 
were eliminated. This reduced each subtest by five itemsj leaving 46 
items in each* Since the Items dropped were included only for compari** 
son purposes, the actual item pool represented in FAIR was not reduced, 

^. Framework for Assessment in Ma thema tics - (FAIM) i ^1 ■.. ^ J 
It was felt that the length of time Involved In solving the mathematics 
problems of FAIM subtests was at fault in students not completing- all 
itemsp Not wishing to reduce the actual item pool | a solution ^ is 
worked out whereby the ten original subtests were increased to fifteen 
subtests* This was done by distributing the 500 Items throughout fifteen 
subtests in sequential order. , In other words, the first fifteen items 

; of the pool were distributed to each of the fifteen subtests. Subse-* 



quentltema wmra similarly. distributed. (See figure below, ) This 
meant that each iubtest no longer contained identical subdomaina of 
items* : Scoring for this change will be handled by adjusting the 
computer program used to analyze, thedata. : 



-REVISED MATRIX FOR FAIM - ^ 
Subtests 

5 6 7 8 9 10 11 12 13 14 15 



A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


8 


S 


S 


S 


S 


S 




S 


S 


S 


M 


M 


H 


H 


M 


^ M 


M 


H 


M 


M 


D 


D 


D 


D 


D 


D 


D 


D 


D 


D 


an 


i so 


on 


c • • 































































































A — Addition 

S — Subtraction 

M — Multiplication 

D — Division 

, , •and so on 
through 57 
objectives 



_J.igure 2- 



Framework for the Assessment of English Skills (FAIES)i 



The Framework for Assessment of English Skills was field tested in 
January 1977. The length of subtests for this first edition is 30 , 
items. Indications are that format and difficulty of items make it~ 
toposaibla at this time to datemlne whether or jiot students could 
complete a 30-item test with mora appropriate items* . ; 



Identifying Unacceptable Items — * The item rasponse count obtained from the 
field testing of the first ejcperimental editions of the frmneworks has been 
analysed for problems with individual test items, .CQnsistent re sponse s to 
the s^e distractor in an item were noted. The item was then analyzed to 



checlr on the possibility of misleading content or two correct answers* 
Reading and mathematics content advisors from ESEA Title I field offices 
in the district and ESEA Title 1 coordinators of reading and mathematics 
programs in sehools were called in to roake jud^ents on those items. Items 
found to be unacceptable by these groups were revised to meet the necessary 
criteria. In some cases Items were discovered that were responded to 
correctly j but did not actually measure performance on the objective with 
which they were identified. These were completely re-written with new con- 
tent and formats. - 



Deteminlng Reliability and Validity of Subdomain Scores = 

■ ~ . f' . .. . 

Reliability r Coefficient alpha will be computed for each content 

subdomain within each testing framework. The procedure used for 

estimating the necessary components of variance (necessary statistics 

' for estimating coefficient alpha) when matrix sampling is used is 

given by Shoemaker (1973). 

Validity : The testing frameworks developed here are content valid 
in that the content of each framework is that agreed upon by all 
advisory panel members as representing what is or should be taught by 
teachers to Title 1 students. Each framework demonstrates additionally 
construct validity because the associated items are operational defini-' 
tlons of constructs defined by the content map. 

Try Ou t and^ Use of Tests ^- ' — 

^ — . . ;. . — ~" - - ■ m. ■ 

Field Testing of the Frameworks — Both FAIR and FAIM have already been 
field tested in the spring and suraner of 1976, and January of 1977, Item 
_ analysis by. computer and critiques on Item content, and test format by 



teachers and program advlsori for the two subject areas have been obtained* 

The' second and third editions Incorporatad changes in test directions,' 
Illustrations 1 distractors and any portions of item content that may have 
been misleading or unrapresantatlve of the objectives. ... .i.... 

The field testing of these second and third experimental editions is still 
a test of the test. Therefore/ the rfisults will not be used to assess 
student achievement. Instead, they will be used to detect further needs for 
revision. 

The groups involved In these uses of the frameworks are as followsi 

1) arr ESEA Title I students in grades 7 through 12 

2) 2,000 sixth grade ESEA Title I students 

3) 2| 400 control students from grades 6 Cl,200)i 8 (1*200) in schools 
not having ESEA Title 1 programs 

Each time the frameworks are administered, one of each group listed 

above will be given FAIR, one^third FAIM, and^one-thlrdFAIES* Selection 
of subgroups of students to receive the three frameworks is made randomly, 

Normlng of the Frameworks — Over a three-year period , program assessment 
through the continued parallel use of the Comprehensive Tests of Basic 
Skills (CTBS) will be compared with results from FAIR and FAIM, Overall 
GTBS results will be compared to the nwi^er of items answered correctly on 
the two frameworks In an attempt to extract nomatlve data from FAIR, FAIM, 
There will be an attempt to establish local norms based on the use of 



Raviilon of Items for the Final ' Edition of the Frfflnework — During the 
course of the first year using FAlRi FAIM* and FAIES, a special type of 
revision process is taking place* Selected students and teachers Involved 
in the secondary compensatory education programs are being asked to partic- 
ipate in workshops for item revision* The object is to alter items where 
possible to be more relevant to the IntGi^ests, maturity level, and cultural 
backgrounds of the students in the -programs. 

Suggestions are being asked for on changes in Illustrations j contents of 
paragraphs for comprehensipn items I contents of graphs and word problems. 
It is hoped that such_revlslons will draw upon current Interests of the 
students, and add relevance and hwnor to the items, allowing students to 
Identify wHh the contents of the subtests, ^ 



Effects o5 Classroom Envirotment on Achievement — * Test results obtained 
from the fall 1977 testing will be analyzed to determine the objectives on 

which students can and cannot ^rform* 

. /- - ' =1 ^ . . 

Schools with poor results and schools, with very good results (in both experl* 
i mental and control groups) will be identified. Classrooms from these groups 

will then be randomly selected for a study of those attributes that comprise 
the instructiohal program in that setting. Such factors as management 
systems used, ^^ontent co^^^ personnel used will be noteS, Observe*? 

tions will be made of what students s teachers j and administrators do while 
in the classroom setting. These evaluations will be conducted over a 
three-year^rlod to determine what progressive effects result from eon- 
; ■ r verting to domain- referenced tasting and its use over _ that ^riod of time. . 

Applicability of Domain* Referenced Test ^sults ^ After test results are 
= 7released t examinees in the fall, each teacher will v. 

ERIC _ -t 



receive a que a tlonnairs asking them to rate and describa the value of 
domaln-refafenced teat results to their program. Of in te re it to this 
project will be effeets of the achievement information on such things as 
elassroom practices, teaching methods, student a ttali^ent and general pro- 
gram organization* 

The s me questionnaire will be administered to the teachers after the 
spring testing results are^^eleased* 

Also included will be questions about the value of test results obtained 
from domain** re fere need tests versus those obtained from nom-refarenced 

....... . ' . # .... . .... ■ . :. . .. ^ , = , 

tests* - - . - — ^.= ^-- - — ■ - - • ■ 

Tralnina of Teache^ra for Use of the Frameworks — Two types of inservice 
training will be necessary to Implement effective use of the frameworks * 
These will involve the followings 

1) Instructing teachers on the administration of the framework 

2) EKplaining the meaning of the^ testrresultaj how they differ from 
- nom-referenced test results and how the results may be applied 

to the program to make instructional decisions. 

Parent Inservice The _ pa ran ts Q f_ ex amlnees will ^e q ff e r e d Inf o rma^io n on 
the characteristica and intent of the domain-referenced tests* Differences 
from norm-referenced testa will be discussed and test results from use of 
the frameworks will be explained* 



24 



^ferences to the Literature 



Millmani Jason, "Passing Scores and Test Lengths for Domain-Eefarenced ^asuras"; 
Paper presented at the Annual Meeting of the MericanEdueational Research Associa^ 
; tion (Chicago, Illinois, April 1972), ERIC Eeport niHribarr ED 065 555, U. 0, Dept. 

of HEW, 0* E* I Educational Jteaources Information Center, Washington, D* C* 20202, ' 

Millmani jason, "Criterion* Referenced tteasurement" Evaluation in Education s 

W* James Popham, Editor. McCutchan Publishing Corporationi Berkeley, California 

1974. p. 309 

Roudbush, Glenn* "Normative Data from a CRT?" CRITERIA CTB Newsletter on Evalua- 
tionj no, 8* Published by CTB /McGraw-Hill i Monterey, California 1974. ^ 



Shoemaker, D* M> Principlas and Procedures of Multiple MatrlK Sampling s Cambridge, 
Mass* % Bollinger Publishing Company, 1973. ™^ ...... 

Shoemaker I David M* To wa r d a F r ame wQ_rlc for A ch ie veme n t _ Te s t ing , unpublished manu- 
script, second draft, 1974| through Southwest Regional Laboratory for Educational 
Research and Development- 



Shoemaker, David M, "Toward a Framework for Achievement Testing, " A Review of 
Educa t iona 1_ Jte se arch ^ publishad by Merican Educational Research Association, 1975/ 



> : JD 
ERIC ~ 



25 

23 



